改进的TF-IDF关键词提取方法<br>Improved TF-IDF Keyword Extraction Algorithm

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The DF-ICF Algorithm- Modified TF-IDF

The tf-idf is an algorithm which is generally used where massive data processing is done. Tf-idf is the weight given to a particular term within a document and it is proportional to the importance of the term. This paper aims to use the idea behind the tf-idf algorithm to design the df-icf algorithm which finds the importance of a particular document within the given corpus. General Terms DF-IC...

متن کامل

Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing

This paper proposes a method to improve shift-reduce constituency parsing by using lexical dependencies. The lexical dependency information is obtained from a large amount of auto-parsed data that is generated by a baseline shift-reduce parser on unlabeled data. We then incorporate a set of novel features defined on this information into the shift-reduce parsing model. The features can help to ...

متن کامل

A Parallel Pages Mining Approach: Combining URL Patterns and HTML Structures

刘奇,刘洋,孙茂松 (清华大学计算机科学与技术系智能技术与系统国家重点实验室,北京 100084) 摘要: 平行语料库是对机器翻译、跨语言信息检索等应用技术具有重要支撑作用的基础数据资源。虽然互联网上的平行网页数量巨大且持续增长,但由于平行网站的异构性和复杂性,如何快速自动获取高质量的平行网页进而构造平行语料库仍然是巨大的挑战。本文提出了一种 URL 模式与 HTML 结构相结合的平行网页获取方法,首先利用 HTML结构实现平行网页的递归访问,其次使用 URL模式优化遍历平行网站的拓扑顺序, 从而实现高效准确的平行网页获取。在联合国与香港政府 1 两个平行网站上的实验表明,我们的方法相对传统获取方法在获取时间上减少 50%以上,准确率提高 15%,并显著提高了机器翻译的质量(BLEU 值分别提高 1.6 和 0.7 个百分点)。关键词:平行网页获取;平行语料库;URL...

متن کامل

Clustering scRNA-Seq Data using TF-IDF

In this abstract, we propose several computational approaches for clustering scRNA-Seq data based on the Term Frequency Inverse Document Frequency (TF-IDF) transformation that has been successfully used in the field of text analysis. Empirical evaluation on simulated cell mixtures with different levels of complexity suggests that the TF-IDF methods consistently outperform existing scRNA-Seq clu...

متن کامل

Deriving TF-IDF as a Fisher Kernel

The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standard models such as the multinomial distribution. This paper investigates the DCM Fisher kernel, a function for comparing documents derived from the DCM. We show that the DCM Fisher kernel has components that are similar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Science and Application

سال: 2013

ISSN: 2161-8801,2161-881X

DOI: 10.12677/csa.2013.31012